Dominance properties for Divisible MapReduce Computations
نویسندگان
چکیده
In this paper we analyze MapReduce distributed computations as divisible load scheduling problem. The two operations of mapping and reducing can be understood as two divisible applications with precedence constraints. A divisible load model is proposed, and schedule dominance properties are analyzed. We investigate dominant schedule structures for MapReduce computations. To our best knowledge this is the first time that processing divisible loads with precedence constraints is considered on the grounds of divisible load theory.
منابع مشابه
Security and Privacy Aspects in MapReduce on Clouds: A Survey
MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern ma...
متن کاملGoogle's MapReduce programming model - Revisited
Google’s MapReduce programming model serves for processing and generating large data sets in a massively parallel manner (subject to a suitable implementation of the model). We deliver the first rigorous description of the model. To this end, we reverse-engineer the seminal MapReduce paper and we capture our observations, assumptions and recommendations as an executable specification. We also i...
متن کاملMapReduce with Deltas
The MapReduce programming model is extended conservatively to deal with deltas for input data such that recurrent MapReduce computations can be more efficient for the case of input data that changes only slightly over time. That is, the extended model enables more frequent re-execution of MapReduce computations and thereby more up-to-date results in practical applications. Deltas can also be pu...
متن کاملA Computational Model for Mapreduce Job Flow
Massive quantities of data are today processed using parallel computing frameworks that parallelize computations on large distributed clusters consisting of many machines. Such frameworks are adopted in big data analytic tasks as recommender systems, social network analysis, legal investigation that involve iterative computations over large datasets. One of the most used framework is MapReduce,...
متن کاملRDFPath: Path Query Processing on Large RDF Graphs with MapReduce
The MapReduce programming model has gained traction in different application areas in recent years, ranging from the analysis of log files to the computation of the RDFS closure. Yet, for most users the MapReduce abstraction is too low-level since even simple computations have to be expressed as Map and Reduce phases. In this paper we propose RDFPath, an expressive RDF path query language geare...
متن کامل